Search CORE

76 research outputs found

Non-native children speech recognition through transfer learning

Author: Falavigna Daniele
Giuliani Diego
Gretter Roberto
Matassoni Marco
Publication venue
Publication date: 01/01/2018
Field of study

This work deals with non-native children's speech and investigates both multi-task and transfer learning approaches to adapt a multi-language Deep Neural Network (DNN) to speakers, specifically children, learning a foreign language. The application scenario is characterized by young students learning English and German and reading sentences in these second-languages, as well as in their mother language. The paper analyzes and discusses techniques for training effective DNN-based acoustic models starting from children native speech and performing adaptation with limited non-native audio material. A multi-lingual model is adopted as baseline, where a common phonetic lexicon, defined in terms of the units of the International Phonetic Alphabet (IPA), is shared across the three languages at hand (Italian, German and English); DNN adaptation methods based on transfer learning are evaluated on significant non-native evaluation sets. Results show that the resulting non-native models allow a significant improvement with respect to a mono-lingual system adapted to speakers of the target language

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Automatic Quality Estimation for ASR System Combination

Author: Falavigna Daniele
Jalalvand Shahab
Matassoni Marco
Negri Matteo
Turchi Marco
Publication venue: 'Elsevier BV'
Publication date: 22/06/2017
Field of study

Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some ROVER extensions rely on critical information such as confidence scores and other ASR decoder features. This information, which is not always available, highly depends on the decoding process and sometimes tends to over estimate the real quality of the recognized words. In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses. We first introduce an effective set of features to compensate for the absence of ASR decoder information. Then, we apply QE techniques to perform accurate hypothesis ranking at segment-level before starting the fusion process. The evaluation is carried out on two different tasks, in which we respectively combine hypotheses coming from independent ASR systems and multi-microphone recordings. In both tasks, it is assumed that the ASR decoder information is not available. The proposed approach significantly outperforms standard ROVER and it is competitive with two strong oracles that e xploit prior knowledge about the real quality of the hypotheses to be combined. Compared to standard ROVER, the abs olute WER improvements in the two evaluation scenarios range from 0.5% to 7.3%

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

DNN adaptation by automatic quality estimation of ASR hypotheses

Author: Falavigna Daniele
Jalalvand Shahab
Matassoni Marco
Negri Matteo
Turchi Marco
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we propose to exploit the automatic Quality Estimation (QE) of ASR hypotheses to perform the unsupervised adaptation of a deep neural network modeling acoustic probabilities. Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" instances based on the word error rate (WER) scores predicted by a QE component. To validate this hypothesis, we run several experiments on the evaluation data sets released for the CHiME-3 challenge. First, we operate in oracle conditions in which manual transcriptions of the evaluation data are available, thus allowing us to compute the "true" sentence WER. In this scenario, we perform the adaptation with variable amounts of data, which are characterised by different levels of quality. Then, we move to realistic conditions in which the manual transcriptions of the evaluation data are not available. In this case, the adaptation is performed on data selected according to the WER scores "predicted" by a QE component. Our results indicate that: i) QE predictions allow us to closely approximate the adaptation results obtained in oracle conditions, and ii) the overall ASR performance based on the proposed QE-driven adaptation method is significantly better than the strong, most recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Fed-EE: Federating Heterogeneous ASR Models using Early-Exit Architectures

Author: Alessio Brutti
Daniele Falavigna
Mohamed Nabih Ali
Publication venue
Publication date: 01/01/2023
Field of study

Automatic speech recognition models require large speech recordings for training. However, the collection of such data often is cumbersome and leads to privacy concerns. Federated learning has been widely used as an effective decentralized technique that collaboratively learns a shared model while keeping the data local on clients devices. Unfortunately, client devices often feature limited computation and communication resources leading to practical difficulties for large models. In addition, the heterogeneity that characterizes edge devices make unpractical federating a single model that fits all the different clients. Differently from the recent literature, where multiple different architectures are used, in this work we 10 propose using early-exiting. This brings 2 benefits: a single model is used on a variety of devices; federating the models is straightforward. Experiments on the public dataset TED-LIUM 3 show that our proposed approach is effective and can be combined with basic federated learning strategies. We also shed light on how to federate self-attention models for speech recognition, for which an established recipe does not exist in literature

Archivio della ricerca - Fondazione Bruno Kessler

Microleakage of Direct Restorations. Comparisonbetween Bulk-Fill and Traditional Composite Resins:Systematic Review and Meta-Analysis

Author: Daniele De Santis
Edoardo Falavigna
Francesca Zotti
Giorgia Capocasale
Massimo Albanese
Publication venue: 'Georg Thieme Verlag KG'
Publication date: 01/01/2021
Field of study

Since the bulk-fill composites were produced, there was a progressive diffusion of their use for direct conservative treatment in posterior teeth. Their chemical structure increases the depth of cure and decreases the polymerization contraction; in this man- ner, bulk-fill composites can be placed in 4 mm single layers and the treatment times are considerably reduced. However, aesthetic and mechanical properties and impact on microleakage of bulk-fill resins are still unclear. This systematic review and meta-analysis aimed to assess the risk of microleakage of direct posterior restorations made of bulk-fill versus conventional composite resins. Researches were performed on PubMed and Scopus databases. Eligible in vivo studies, published since 2006, were reviewed. Outcomes of marginal discoloration, marginal adaptation, and recurrent caries were considered to conduct the systematic review and meta-analysis. Secondary data were examined to implement additional analysis and assess the risk of bias. Eight randomized clinical trials were analyzed, involving 778 direct restorations. The summary of RCTs led to significant but inconsistent results; the marginal discolor- ation and recurrent caries were found to be improved respectively by 5.1 and 1.4%, whereas the marginal adaptation was reduced of 6.5%. Secondary analyses revealed that follow-up periods, the adhesive system used and the class of carious lesions eval- uated are confounding factors, and they result in a risk of bias across studies. Bulk-fill composites are innovative materials for conservative dentistry and they can be used to reduce treatment steps and duration of operative times. There are insufficient data to explore the relationship between bulk-fill composites and microleakage and further investigations are needed

PubMed Central

Catalogo dei prodotti della ricerca

Automatic assessment of spoken language proficiency of non-native children

Author: Allgaier Katharina
Falavigna Giuseppe Daniele
Gretter Roberto
Matassoni Marco
Tchistiakova Svetlana Lvovna
Publication venue
Publication date: 15/03/2019
Field of study

This paper describes technology developed to automatically grade Italian students (ages 9-16) on their English and German spoken language proficiency. The students' spoken answers are first transcribed by an automatic speech recognition (ASR) system and then scored using a feedforward neural network (NN) that processes features extracted from the automatic transcriptions. In-domain acoustic models, employing deep neural networks (DNNs), are derived by adapting the parameters of an original out of domain DNN

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Driving ROVER with Segment-based ASR Quality Estimation

Author: Falavigna Giuseppe Daniele
Jalalvand Shahab
Negri Matteo
Turchi Marco
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

ROVER is a widely used method to combine the output of multiple automatic speech recognition (ASR) systems. Though effective, the basic approach and its variants suffer from potential drawbacks: i) their results depend on the order in which the hypotheses are used to feed the combination process, ii) when applied to combine long hypotheses, they disregard possible differences in transcription quality at local level, iii) they often rely on word confidence information. We address these issues by proposing a segment-based ROVER in which hypothesis ranking is obtained from a confidence-independent ASR quality estimation method. Our results on English data from the IWSLT2012 and IWSLT2013 evaluation campaigns significantly outperform standard ROVER and approximate two strong oracles

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

EnetCollect in Italy

Author: Daniele Falavigna
Irene Russo
Lionel Nicolas
Luisa Bentivogli
Monti Johanna
Roberto Gretter
Sangati Federico
Verena Lyding
Publication venue: Accademia University Press
Publication date: 01/01/2018
Field of study

Università degli Studi di Napoli L'Orientale: CINECA IRIS